SRE for Mobile Application(SREcon18 Europe)

https://www.youtube.com/watch?v=4pQP8xiiMgo

Abstract

In the server side world, we can and do lean heavily on redundancy, scaling, and direct control to engineer reliability; however, in the mobile world these well known facets of SRE (amongst others) are virtually non-existent. We typically have no binary rollbacks or downgrades, no forced updates/upgrades, and no ability to turn it off and back on again. Users rely on mobile applications more and more, and it's important that SREs consider client-side reliability. This talk discusses practices, principles, and processes that can be applied in the mobile world to make client-side code a first-class citizen, along with real-world case studies of what has and hasn't worked.

https://gyazo.com/7fe3779355a0024dfcfccf904bd8a1da

Capacity planningがない

クライアントのSRE

良いとこ

ClientはUXまんまである

SSよりもリッチな指標

伸びしろ

SSで50ms減らすのはめちゃ大変でもクライアントではシュッとできるかもしれない

クライアント側が選択肢になる

悪いところ

バイナリ配信のアーキテクチャなので直接制御はできない

ロールバックができない

スケールしない(それぞれの場所で動く)

無限の環境の組み合わせ

OS, 端末, version

完璧なモニタリングはできない(さまざまな状況がありうる)

モニタリング

パーフェクトじゃなくてもモニタリングしないよりまし

リアルタイム

クライアントがAPI叩く

APIがログ叩く

SSのメトリクスをみてもいい

注意

Coordinarity: 意味のあるログを取る

Cost: バッテリー、パケットはトレードオフ

Reliability: 送信できなかった時に送信するなど

本番モニタリングのためのCI

Blackbox probe

UIテストを本番サーバー + Emulatorでやる

UIテストがあればそれ使える

優先度低め

システムとアプリの状態

https://gyazo.com/324837c4613eeaecd5168e688e6730ec

はい

100人近くがアプリを触っていて50~100のFlagがある

#wip #MRE